A hierarchical unsupervised growing neural network for clustering gene expression patterns

نویسندگان

  • Javier Herrero
  • Alfonso Valencia
  • Joaquín Dopazo
چکیده

MOTIVATION We describe a new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network. DNA array technologies allow monitoring thousands of genes rapidly and efficiently. One of the interests of these studies is the search for correlated gene expression patterns, and this is usually achieved by clustering them. The Self-Organising Tree Algorithm, (SOTA) (Dopazo,J. and Carazo,J.M. (1997) J. Mol. Evol., 44, 226-233), is a neural network that grows adopting the topology of a binary tree. The result of the algorithm is a hierarchical cluster obtained with the accuracy and robustness of a neural network. RESULTS SOTA clustering confers several advantages over classical hierarchical clustering methods. SOTA is a divisive method: the clustering process is performed from top to bottom, i.e. the highest hierarchical levels are resolved before going to the details of the lowest levels. The growing can be stopped at the desired hierarchical level. Moreover, a criterion to stop the growing of the tree, based on the approximate distribution of probability obtained by randomisation of the original data set, is provided. By means of this criterion, a statistical support for the definition of clusters is proposed. In addition, obtaining average gene expression patterns is a built-in feature of the algorithm. Different neurons defining the different hierarchical levels represent the averages of the gene expression patterns contained in the clusters. Since SOTA runtimes are approximately linear with the number of items to be classified, it is especially suitable for dealing with huge amounts of data. The method proposed is very general and applies to any data providing that they can be coded as a series of numbers and that a computable measure of similarity between data items can be used. AVAILABILITY A server running the program can be found at: http://bioinfo.cnio.es/sotarray.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles

MOTIVATION The increasing use of microarray technologies is generating large amounts of data that must be processed in order to extract useful and rational fundamental patterns of gene expression. Hierarchical clustering technology is one method used to analyze gene expression data, but traditional hierarchical clustering algorithms suffer from several drawbacks (e.g. fixed topology structure; ...

متن کامل

Hierarchical growing cell structures: TreeGCS

We propose a hierarchical, unsupervised clustering algorithm (TreeGCS) based upon the Growing Cell Structure (GCS) neural network of Fritzke. Our algorithm improves an inconsistency in the GCS algorithm, where the network topology is susceptible to the ordering of the input vectors. We demonstrate improved stability of the GCS foundation by alternating the input vector order on each presentatio...

متن کامل

Adaptation of Multilayer Perceptron Neural Network to unsupervised Clustering using a developed version of k-means algorithm

Cluster analysis plays a very important role in different fields and can be mandatory in others. This fact is due to the huge amount of web services, products and information created and provided on the internet and in addition the need of representation, visualization and reduction of large vectors. So in order to facilitate the treatment of information and reducing the research space, data mu...

متن کامل

Serendipity in Text and Audio Information Spaces: Organizing and Exploring High-Dimensional Data with the Growing Hierarchical Self-Organizing Map

The Self-Organizing Map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as in data mining applications. However, at least two limitations have to be noted, which are caused, on the one hand, by the static architecture of this model, as well as, on the other hand, by the limited capabilities for the representation of hierarchical relations of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 17 2  شماره 

صفحات  -

تاریخ انتشار 2001